Search CORE

17 research outputs found

Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting

Author: Arik Sercan O.
Child Rewon
Coates Adam
Fougner Chris
Gibiansky Andrew
Hestness Joel
Kliegl Markus
Prenger Ryan
Publication venue
Publication date: 04/07/2017
Field of study

Keyword spotting (KWS) constitutes a major component of human-technology interfaces. Maximizing the detection accuracy at a low false alarm (FA) rate, while minimizing the footprint size, latency and complexity are the goals for KWS. Towards achieving them, we study Convolutional Recurrent Neural Networks (CRNNs). Inspired by large-scale state-of-the-art speech recognition systems, we combine the strengths of convolutional layers and recurrent layers to exploit local structure and long-range context. We analyze the effect of architecture parameters, and propose training strategies to improve performance. With only ~230k parameters, our CRNN model yields acceptably low latency, and achieves 97.71% accuracy at 0.5 FA/hour for 5 dB signal-to-noise ratio.Comment: Accepted to Interspeech 201

arXiv.org e-Print Archive

Crossref

SlimPajama-DC: Understanding Data Combinations for LLM Training

Author: Hestness Joel
Ma Liqun
Neiswanger Willie
Shen Zhiqiang
Soboleva Daria
Tao Tianhua
Vassilieva Natalia
Xing Eric
Publication venue
Publication date: 19/09/2023
Field of study

This paper aims to understand the impacts of various data combinations (e.g., web text, wikipedia, github, books) on the training of large language models using SlimPajama. SlimPajama is a rigorously deduplicated, multi-source dataset, which has been refined and further deduplicated to 627B tokens from the extensive 1.2T tokens RedPajama dataset contributed by Together. We've termed our research as SlimPajama-DC, an empirical analysis designed to uncover fundamental characteristics and best practices associated with employing SlimPajama in the training of large language models. During our research with SlimPajama, two pivotal observations emerged: (1) Global deduplication vs. local deduplication. We analyze and discuss how global (across different sources of datasets) and local (within the single source of dataset) deduplications affect the performance of trained models. (2) Proportions of high-quality/highly-deduplicated multi-source datasets in the combination. To study this, we construct six configurations of SlimPajama dataset and train individual ones using 1.3B Cerebras-GPT model with Alibi and SwiGLU. Our best configuration outperforms the 1.3B model trained on RedPajama using the same number of training tokens by a significant margin. All our 1.3B models are trained on Cerebras 16

\times

CS-2 cluster with a total of 80 PFLOP/s in bf16 mixed precision. We further extend our discoveries (such as increasing data diversity is crucial after global deduplication) on a 7B model with large batch-size training. Our models and the separate SlimPajama-DC datasets are available at: https://huggingface.co/MBZUAI-LLM and https://huggingface.co/datasets/cerebras/SlimPajama-627B.Comment: Technical report. Huggingface: https://huggingface.co/MBZUAI-LLM and https://huggingface.co/datasets/cerebras/SlimPajama-627

arXiv.org e-Print Archive

Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric foundation and instruction-tuned open generative large language models (LLMs). The models are based on the GPT-3 decoder-only architecture and are pretrained on a mixture of Arabic and English texts, including source code in various programming languages. With 13 billion parameters, they demonstrate better knowledge and reasoning capabilities in Arabic than any existing open Arabic and multilingual models by a sizable margin, based on extensive evaluation. Moreover, the models are competitive in English compared to English-centric open models of similar size, despite being trained on much less English data. We provide a detailed description of the training, the tuning, the safety alignment, and the evaluation of the models. We release two open versions of the model -- the foundation Jais model, and an instruction-tuned Jais-chat variant -- with the aim of promoting research on Arabic LLMs. Available at https://huggingface.co/inception-mbzuai/jais-13b-chatComment: Arabic-centric, foundation model, large-language model, LLM, generative model, instruction-tuned, Jais, Jais-cha

arXiv.org e-Print Archive

Reducing the SPEC2006 Benchmark Suite for SimulationBased Abstract Computer Architecture Research

Author: Joel Hestness
Lenni Kuff
Publication venue
Publication date
Field of study

Present day computer architects use advanced microarchitecture simulators to test the performance of processor designs. The simulator workloads are generally benchmarks, which are representative of specific types of real world applications. Because microarchitecture implementations increase in complexity and the simulation workloads are required to represent complicated applications, the simulation time has greatly increased. To solve the problem, researchers are looking into ways to reduce the amount of time benchmarks run, while maintaining the same workload characterization of the longer benchmarks. MinneSPEC is a representative reduction of SPEC2000, with the reduced input sets found using SimpleScalar profiling tools [1]. With the release of SPEC CPU2006, new benchmarks have been added to the SPEC benchmarking suite which will be used to evaluate performance in tomorrow's microprocessors. These benchmarks are considerably larger than SPEC2000 and using SimpleScalar to profile their workloads would take a large amount of time and effort. This paper suggests a different reduction technique which gathers profiling information using processor performance counters accessed using PAPI. Since workloads are running on a native system instead of a simulator, profiling information can be gathered in a much shorter amount of time. This allows for fine grained tuning of reduced input sets so more representative reduced benchmarks can be found in a much shorter amount of time. Using this technique, we were able to reduce five SPEC2006 benchmarks to under 1

CiteSeerX